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Abstract 

A toy model of a neural network in which both Hebbian learning and reinforcement 
learning occur is studied. The problem of 'path interference', which makes that the 
neural net quickly forgets previously learned input-output relations is tackled by 
adding a Hebbian term (proportional to the learning rate rj) to the reinforcement 
term (proportional to p) in the learning rule. It is shown that the number of learning 
steps is reduced considerably if 1/4 < n/p < 1/2, i.e., if the Hebbian term is neither 
too small nor too large compared to the reinforcement term. 
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1 Introduction 



The central question which we address in this article is in what way a biological 
neural network, i.e., the brain, or, more generally, a part of the nervous system 
of an animal or human being, may learn to realize input-output relations. By 
biological we here mean: realizable with the help of elements occurring in 
nature, e.g., neurons or chemical substances that may influence other neurons 
or the synaptic efficacy 

An example of an input-output relation is a motor task, like catching a prey, 
in reaction to visual, auditive, or other input. Many attempts to explain the 
way input-output relations of this kind might be realized by (artificial) neural 
nets are encountered in the literature, most of which are not satisfactory from 
a biological point of view as we will illustrate in subsection 1.1. 

It is the purpose of this article to combine ideas which do satisfy certain 
biological constraints and study a toy model, in particular with respect to its 
ability to learn and realize input-output relations. 

The widely accepted idea of Hebbian learning [5] at the one hand will be com- 
bined with some rule that implements a feedback signal at the other hand, 
in a way that, in principle, might be biologically realizable. Without the ad- 
dition of any feedback-signal, learning of prescribed input-output relations 
— whether in reality or in a model — is, of course, impossible. 

1.1 Artificial learning rules 

If one wants a network to learn to realize input-output relations, there are var- 
ious well-known prescriptions, associated with names like perceptron learning 
rule, back-propagation or Boltzmann machines [12,7]. None of these, however, 
model the functioning of real brains, since the learning rules in question vi- 
olate the existing biological limitations. In order to illustrate this statement, 
let us give an example. 

Consider a single layered feed- forward network, i.e., a network consisting of 
an input and an output layer only, in which signals are sent by neurons of the 
input layer to neurons of the output layer, and not the other way around. Let 
Wij be the strengths or 'weights' of the connections in this simple net. In 1962, 
Rosenblatt [13] proved that such a network will realize desired input-output 
relations if, a finite number of times, the weights are adapted according to the 
rule 

w i:j -> wij + Awij (1) 
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with 



Awij = e(x T i - x i)Xj 



(2) 



where x^i is the desired or target output of neuron i, and xoi is its actual 
output. Furthermore, Xj is the state of the pre-synaptic input neuron j and e 
is some function of the neuron states and properties of neurons i and j. This 
learning rule can not be realized by a biological neural net since neuron i, 
producing xoi, cannot know that it should produce x^i- If, e.g., an animal does 
not succeed in catching a prey, its neurons get no specific feedback individually, 
on what the right output x-n should have been. Hence, x^i — x Qi cannot be 
determined by the biological system, and, therefore, neither can it adapt the 
weights according to (2). Consequently, the perceptron learning rule (2) is 
unsuitable for a realistic modeling of the way in which a biological neural net 
can learn and realize input-output relations. Similar observations can be made 
for back-propagation or Boltzmann machines. 

1.2 Biological learning rules; Hebbian learning and reinforcement learning 

Already in 1949, Hebb suggested [5] that, in biological systems, learning takes 
place through the adaptation of the strengths of the synaptic interactions 
between neurons, depending on the activities of the neurons involved. In a 
model using binary neurons, i.e., X{ — or X{ — 1, the most general form of 
a learning rule based on this principle is a linear function in X{ and Xj since 
x\ = Xi and x 2 - = Xj. It therefore reads 

bijXi -\- CijXj -\- dijXiXj (3) 

In a biological setting, the coefficients aij,bij,Cij and dij in this learning rule 
can only depend on locally available information, such as the values of the 
membrane potential 



and the threshold potential 0, of neuron i. In this way, the system adapts its 
weights without making use of neuron-specific information, like, e.g., xt«, of 
which there can be no knowledge, locally, at the position of the synapse. 

In a recurrent neural net, a Hebbian learning rule suffices to store patterns 
[6,12] if all neurons are clamped to the patterns which are to be learned during 
a certain period, the 'learning stage'. In feed- forward networks, however, only 



(4) 



j 
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the neurons of the input layers are clamped, and some kind of feed-back sig- 
nal, governing the direction of adaptation of the synapses during the learning 
procedure, is indispensable. Probably the simplest form of such a signal one 
can think of is a 'success' or 'failure' signal that is returned to the network 
after each attempt to associate the correct output to given input. On the basis 
of trial and error, a neural network can then indeed learn certain tasks, the 
principle on which 'reinforcement learning' is based [3,14]. This principle of re- 
inforcement learning has a rather natural interpretation: satisfactory behavior 
is rewarded, or reinforced, causing this behavior to occur more frequently. The 
reinforcement signal is supplied by the subject's environment, or by its own 
judgment of the effect of its behavior. In a biological perspective, one could 
think of the synaptic change being influenced by some chemical substance, 
which is released depending on whether the evaluation by the subject of the 
effect of the output is positive or negative, i.e., whether it is happy or unhappy 
with the attempt it made. 

Note that, in learning by reinforcement, the search for the correct output 
is more difficult, and, hence, slower, than for non-biologically realizable al- 
gorithms like the perceptron learning rule or back-propagation. This is not 
surprising, since the latter give the system locally specific information on how 
to adjust individual weights, while reinforcement rules only depend upon a 
global 'measure of correctness'. 

The most general form of a reinforcement learning algorithm is given by the 
prescription 

Aw i:j = aij(r) + bij^Xi + c i:j (r)xj + ^(r)^^- (5) 

Here the coefficients a^, bij, Cij, dij, besides their dependence on the local vari- 
ables such as the membrane potential, will in principle also depend on a rein- 
forcement signal, denoted by r. The value of r is usually a real number between 
and 1, denoting the degree of success (r = 1 means success, r = means 
failure). 

An important issue in the literature on reinforcement learning is the so called 
'credit assignment problem' [14]. It refers to the question how a neural net 
knows which connections Wij were responsible for a successful or unsuccessful 
trial, and, as a consequence, which connections should be 'rewarded', and 
which should be 'punished', respectively. 

In their article 'Learning from mistakes' (1999), Chialvo and Bak [4], pro- 
posed a class of networks, in which learning takes place on the basis of a 
'deinforcement signal' only, i.e., the weights of active synapses are decreased 
if the output is wrong, they are 'punished', so to say, in case of wrong perfor- 
mance of the network. If the output is right nothing happens. This procedure 
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works as long as the average activity in the net is kept very low: when only a 
few neurons are active at an unsuccessful attempt, one can be sure that the 
connections between these active neurons were the ones which were responsi- 
ble, and thus should be 'punished'. In this way Chialvo and Bak obtained an 
elegant solution to the credit assignment problem. 

The absence of a reinforcement signal (nothing happens if r = 1) makes their 
learning rule relatively simple. It is a version of the general rule (5) with bij = 

and Cij = 0: 

Awij = -(1 - r)(pxiXj - y?) (6) 

where p and y? are positive constants; in this article we will suppose ip << p. 
A biological mechanism that could implement the learning rule (6) is the fol- 
lowing: if the output is correct, nothing happens, since the network obviously 
performs satisfactory. If not, a chemical substance is released, which has the 
effect that synapses between neurons that have just been active, and thereby 
are 'tagged' in some electro-chemical way, are depressed. 

1.3 Purpose 

The success of the 'minibrain model' of Chialvo and Bak [4] (as Wakeling 
and Bak referred to it in [15]), is limited to feed-forward neural nets in which 
the number of input and output neurons (or, equivalently in this model, the 
number of patterns) is small compared to the number of neurons in the hidden 
layer. As the number of neurons in the hidden layer decreases, learning, at a 
certain moment, becomes impossible: 'path interference' is the phenomenon 
which causes this effect [16]. Essentially, it amounts to the following. If, in 
each layer of the feed-forward neural net, only one neuron is active at each 
time step, an input-output relation corresponds to a path of activity along the 
strongest connections between the neurons. Basically, path interference comes 
down to the erasure of an existing path of activity, which was correct at a 
previous learning step, by a change due to a punishment of a connection while 
trying to learn a different input-output relation. If the probability for this 
path interference to occur becomes too large, learning times tend to infinity. 

In this article we attempt to improve the performance of the minibrain model 
of Chialvo and Bak — in the sense of decreasing the learning time — by making 
sure that, at the occurrence of path interference, the punishment of a correct 
activity path is no longer such that the memory is erased. We achieve this by 
adding to the deinforcement term in the learning rule (6), which is propor- 
tional to p, a Hebbian term proportional to 77. The latter term has the effect 
that an active path is strengthened, mitigating in this way the punishment. 
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By choosing the ratio between the coefficients r] and p of both terms in the 
learning rule appropriately, we are able to reduce the number of learning steps 
significantly, without making the model less realistic from a biological point 
of view. In fact, in the class of models we study, Hebbian learning is a most 
appropriate way to account for biological observations like LTP and LTD [8]. 
In section 4 we explain that if the quotient of the Hebbian learning rate and 
the coefficient of the deinforcement term is in the range 



learning times are reduced considerably. 

In their article [2], Chialvo and Bak proposed a different way to solve the 
problem of path interference. They reduced the amount of punishment of the 
connections that once had been active in a correct response. In this model 
a neuron needs to memorize whether it previously was active in a successful 
trial. In our model such a neuron memory is not needed. 

Let us denote the deinforcement contribution to learning by Aw^ and denote 
the Hebbian part by Aw",. We will study a learning rule of the form 



From all possibilities for Hebbian learning summarized by equation (3), we 
choose for Aw''- a rule in which the coefficients a,-,- and ft,-,- both are zero: 



We choose this particular form since it can be argued that this form is a most 
plausible candidate from a biological point of view [6] . 

Our paper has been set up as follows. In section 2, we describe a feed-forward 
network with one hidden layer, which we will use to study the learning rule 
(8), with (6) and (9). In section 3, numerical studies of various situations are 
given and explained. It turns out, in general, that taking into account Hebbian 
learning, and viewing it as a process which is permanently active, irrespective 
of the occurrence of reinforcement learning, has a positive influence on the 
learning time of the neural net. This is a new result, which, to the best of our 
knowledge, has not been noticed before. 




1 7] 1 
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(8) 




(9) 
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Fig. 1. An example of a fully connected feed forward network with Ni input neu- 
rons, iVn hidden neurons and A^o output neurons. The filled circles represent active 
neurons. 

2 Description of the model: updating rules for neuron activities 
and connection weights 

In order to explore a simplified model of the brain we consider a fully con- 
nected, feed-forward neural network with an input layer of N\ neurons, one 
hidden layer of Nr neurons, and an output layer of No neurons, see figure 1. 
The state Xi of neuron % is 1 if neuron % fires and if it does not. In general, a 
neuron fires if its potential hi is sufficiently high, where hi stands for the mem- 
brane potential V cx — V{ Q , the difference between the intra- and extra cellular 
potentials V- in and V cx . Following Chialvo and Bak [4], we model the dynamics 
of the neural net by simply stating that in the hidden and output layers a 
given number of neurons having the highest potentials hi — in their respec- 
tive layers — are those that will fire. In their terminology: we use extremal 
dynamics. 

For McCulloch and Pitts neurons a way to control the average activity might 
be realized via suitably chosen threshold potentials (see e.g. [1], [3]). In 
nature, the average activity will depend on the threshold potentials and may, 
moreover, be influenced by chemical substances or the network structure [8,10]. 
In our model we put the threshold potentials 6i equal to zero: 

9i = (10) 

The number of neurons in the input, hidden and output layers that we choose 
to be active, will be denoted by Nj a \ and Nq* , respectively. 

The input pattern, a specific set of states of neurons in the input layer, will be 
denoted by £ : = (£n, ...,£iNi)- The network is to associate every input pattern 
with a desired, or target, output pattern, £ T = (£ti, ■ ■■,£tn )- The £ r and £ T 
are vectors with components equal to or 1. Consequently, the number of 
active neurons of the input and output layer are given by 
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Ni 

^ (a) = Ee y (ii) 

3=1 
N 

N^ = T,^ (12) 

3=1 

In our numerical experiments, these numbers will be taken to be equal. More- 
over, the number of active neurons in the hidden layer, 

< a) = f>H, (13) 

3=1 



where x^j is the neuron state of neuron j in the hidden layer, will also be 

equal to the number of active neurons in the other layers. Hence, we choose 
N (a) = N g) = N g)^ 

We thus have explicitated the network dynamics. We now come to the update 
rules for the synaptic weights Wij. Updating of the network state will happen 
at discrete time steps. At every time step t n , all neuron states are updated 
in the order: input layer - hidden layer - output layer. This being done, the 
values of the weights are updated, according to 

Wij(t n+1 ) = Wij(t n ) + AWijitn) (14) 

Substituting (9) and (6) into (8), we find 

Awij = e(x h Xj)(2xi - l)xj - (1 - r)(pXiXj - ip) (15) 

For the pre-factor of the Hebbian term we take [6] 

e(xi,Xj) = t](k - (2xi - l)(hi - Oi)) (16) 



The constants 77 and k are positive numbers, the so-called learning rate and 
margin parameter. Finally, combining the above ingredients and noting that 
we chose 9i — 0, the learning rule reads: 

Awij(t n ) = t)[k - hi(t n )(2xi(t n ) - \)}[2xi(t n ) - l]xj(t n ) 

+ (1 - r)[-pxi(t n )xj(t n ) + if] (17) 

Note that Xi(t n ) and Xj(t n ) are the activities of neurons in adjacent layers, 
since in our model there are no lateral connections. The constant tp is chosen 
in such a way that the change in J2i,j w ij, where the sum is extended over % 
and j in adjacent layers, due to the p-term (not due to the Hebbian term), 
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is independent of p. This can easily be achieved by choosing tp = p/P, where 
P is the product of the numbers of neurons in two adjacent layers, i.e., is 
equal to either p/NiN^ or p/N^No- 



3 Numerical Simulations 



The network described in the previous section will now first be studied nu- 
merically. The numerical details are: 

- The initial weights Wij(t ) are randomly chosen with values between —0.01 
and +0.01. 

- The punishment rate p will be kept constant at 0.02. Thus when we vary 
the f]/ p ratio, we vary the learning rate rj. 

- The margin parameter k, appearing in (16), will be kept at the value 1. 

- Whenever the number of neurons in the input, hidden or output layer is 
fixed, we choose Nj = 8, N u = 512 and JV = 8. 

- All data are averaged over 512 samples. 

The network is confronted with p different input patterns £f, (p = 1, . . . ,p), 
to which correspond equally many target patterns Learning proceeds as 
follows. The input layer remains clamped to the first input pattern until the 
time step at which the target pattern has been found. As soon as this input- 
output relation p = 1 has been learned, we switch to input pattern p = 2. 
After the corresponding target pattern has been found we continue, up to the 
p-th input-output relation. Then, we have completed what we will refer to as 
one 'learning cycle'. 

After this cycle we start the process again, up to the point where the network 
can recall all p input-target relations at once. When that is the case, learning 
stops. We count the number of learning steps needed to learn all input-output 
relations. 

Chialvo and Bak consider the case of one active neuron in each layer. In section 
3.1 we present a numerical experiment with a neural network for which the 
activities are larger than one, i.e., > 1, > 1 and > 1. In 

particular we study the total number of learning steps as a function of the 
ratio r\j p. In section 3.2 we vary the number of neurons in input and output 
layer and keep the hidden layer fixed, and vice versa. Finally, in section 4, we 
interpret our results. 
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3.1 Effect of the Hebbian term 



In 'Learning from mistakes' Chialvo and Bak [4] studied the case of one active 
neuron in the input, the hidden and the output layers: = = Nq^ = 1. 

fa) (a) (a) 

We here will study what happens when iVf , and Nq all are larger than 
one. 



In our first numerical experiment we take a network with p = 8 input-target 
relations for which, in each input or target pattern //, the number of active 



neurons is 2, i.e., sure 2 the number of learning; 

steps is plotted against the ratio n/p of the two proportionality coefficients 
related to the Hebbian and the deinforcement term respectively. 



r(a) 



r(a) 



8000 
7000 
6000 
5000 
4000 
3000 
2000 
1000 














I 






















































































i i i 


T Y y y y - 



0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

v/p 

Fig. 2. The number of learning steps as a function of the quotient n/p. There are 
eight input-target relations to be learned and two active neurons in each layer. The 
number of neurons in the input, hidden and output layers are N\ = 8, Nn = 512 
and No = 8. Initially, the number of learning steps increases as a result of the 
Hebbian learning term, but at n/p = 0.1 the number of learning steps starts to 
decrease dramatically from 7500 to 250 at n/p = 0.25. For n/p > 0.50, learning is 
impossible. 

From figure 2 we see that, when there is no Hebbian component in the learning 
rule (n = 0), the net needs 2500 learning steps to learn an input-output 
task. When we add a slight Hebbian component (77 small) the number of 
learning steps increases, and, hence, the ability of the net to learn diminishes. 
However, when the Hebbian component becomes more and more important, 
the number of learning steps starts to decrease dramatically: for rj/p between 
0.25 and 0.5 the number of learning steps is approximately 250. The Hebbian 
component, which has the tendency to engrave previously learned patterns, 
seems to help to not forget the old patterns. If n/p exceeds the value 0.5, 
learning fails. Apparently, the 'progressive' p term, the power of which is to 
help the network to quickly adapt itself to new input-output relations, cannot 
conquer the 'conservative' power of the 77-dependent (i.e., the Hebbian) term. 
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We will come back to these points and consider the effects of the r] and p terms 
in some detail in section 4. 

3.2 Size dependences 

In this section we consider the network of figure 1 for varying numbers N\ = 
N and N u . 

3.2.1 Effect of varying the sizes of the input and output layers 

In this subsection we test the performance of the network for various sizes 
of the input, output and hidden layers. In particular, we chose to study the 
size-dependence for three different values of the learning parameter: 77/ p — 0, 
T)/p — 0.10 and rj/p — 0.45, values which we selected on the basis of the results 
of the previous subsection. 

First, we take a network with the fixed number of 512 neurons in the hidden 
layer, and only one active neuron per layer. The input and output layers will 
consist of increasing, equal numbers of neurons, starting with Nj = N Q = 4. 
Moreover, we choose the number of input-output relations p to be learned 
equal to the number of neurons in the input and output layers. The input and 
output layers will be enlarged in steps of 4 neurons, up to Ni = No = 28 
neurons. 

In Figure 3 we give the number of learning steps per pattern for the above 
mentioned three values oirj/p. The positive effect of the addition of a Hebbian 
term to the learning rule becomes more and more convincing with increasing 
number of input-output relations to be learned. 

3.2.2 Effect of varying the size of the hidden layer 

Next we consider a network with 8 input neurons, 8 output neurons and 8 
subsequent patterns. The number of active neurons is 2 for all input and 
target patterns. We vary the number of neurons in the hidden layer. 

In Figure 4 we have plotted the number of learning steps as a function of 
the number of neurons in the hidden layer for three values of the quotient 
r\j p. Note that, in agreement with figure 2 the number of learning steps is 
the largest for r]/p — 0.1 (the symbols □ in figure 4). A suitably chosen value 
for the coefficient 77 of the Hebbian term makes it possible for the network to 
perform satisfactory with very small number of neurons in the hidden layer 
(the symbols Q in the figure). 
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Fig. 3. The number of learning steps per pattern as a function of the number of 
input-output relations p, for rj/p = (■), rj/p = 0.1 (□) and rj/p = 0.45 (O)- Input 
and output patterns have only one active neuron. The number of neurons in input 
and output layers equals the number of patterns p. Note the logarithmic scale of 
the vertical axis. For a small number of input-output patterns, the learning time 
is roughly equal for different values of rj/p. The advantageous effect of a Hebbian 
term in the learning rule for this learning task becomes more and more pronounced 
with increasing numbers of input-target relations. 
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Fig. 4. Dependence of the number of learning steps on the number of neurons in the 
hidden layer of the network. The symbols (■), (□) and (O) correspond to rj/p = 0, 
rj/p = 0.1 and rj/p = 0.45 respectively. All input patterns and output patterns have 
2 active neurons. The number of input neurons, output neurons and patterns are 
fixed; Ni = 8, N Q = 8, p = 8. 
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Fig. 5. Path interference At both the time steps t p and t q the neuron C of the hidden 
layer fires, and as a result the same neuron D of the output layer is activated. This 
unwanted effect is due to the fact that the connection of B and C happens to be 
the largest one. The paths ACD and BCD partially overlap. 

4 Explanation of the effect of the Hebbian term 

The different behavior for different values of rj/p is mainly due to its con- 
sequences for the effect we call path interference, after Wakeling [16], who 
studied the critical behavior of the Chialvo & Bak minibrain. 

As an example, let us consider the case in which only one neuron is active in 
each layer. In this case, the 'path of activity' from the active input neuron to 
the corresponding output neuron runs along the outgoing connections with the 
largest weights. During the learning process, it is possible that an established 
path (connecting, e.g., the active neuron of pattern £j with the active neuron, 
in the output layer, of £q) * s 'wiped out' by an attempt to learn one of the 
other input-output relations. This is likely to happen if the same neuron in 
the hidden layer becomes active, and, consequently, the connection to the 
output neuron corresponding to the previously learned pattern is punished 
by an amount p. This phenomenon of path interference will occur once in 
a while, irrespective of the values of the parameters rj and p. However, the 
question whether the memory of the old pattern is wiped out (i.e., whether 
the connection to the output neuron under consideration is no longer the 
largest), does depend on the parameters p and rj. To find out how, we should 
consider the change of this connection compared to the change of the other 
connections from the same hidden neuron to the other output neurons. For 
the total relative change, two different learning steps should be taken into 
account. Firstly, the one at t p , at which the right output was found, and 
the deinforcement term did not contribute, and, secondly, the learning step 
at t q , at which path interference occurred, and the deinforcement term did 
contribute. 

Let «; W m be the largest outgoing weight from the active neuron in the hidden 
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layer to the output layer, and let w com be a weight value which is representative 
for one of the other, competing weights connecting the same neuron in the 
hidden layer to a different neuron in the output layer. 

The membrane potentials of each neurons % of the output layer are given, 
according to (4), by hi = Wij, where j is a neuron of the hidden layer. From 
(17), with Xj = 1, we find in case of success (r = 1) for the changes of the 
connections to the winning (xj) and the competing (xj = 0) neurons in the 
output layer: 



AWwin (t p ) = V [k - w win (t p )} (18) 
Aw com (t p ) = -t)(k + w com (t p )) (19) 

respectively. Similarly, in case of failure (r = 0) these changes are 



Aw win (t q ) =t](k - w whl (t q )) - p + ip (20) 

Aw com (t q ) =-7}(k + W com (t q )) +if (21) 

respectively. Only if the increase of w cora is larger than the increase of u> W m, 
the memory path can be wiped out, since then w com may become the largest 
weight, i.e., if 

Aw com (t p ) + Aw com (t q ) > Aw win (t p ) + Aw wiQ (t q ) (22) 



We now substitute (18)-(21) into (22). In the resulting inequality we can ignore 
the values of w win and w com relative to k as long as the number of adaptations 
of w W i n and w com is small; recall that k — 1, p — 0.02, and the initial values of 
the weights are in the range [—0.01,0.01] in our numerical experiments. With 
these approximations, the inequality reduces to p > 4r]. In the opposite case, 

p < At, (23) 



w w in will remain larger than w com and, consequently, path interference will 
not wipe out learned input-output relations, which explains the decrease of 
the number of learning steps for r] > \p. For r\ > \ the network is incapable 
of learning input-output relations. This can be seen as follows. Each time a 
winning connection is punished, i.e., the output is wrong, it changes approxi- 
mately by an amount r\ — p, whereas the competing connection changes by an 
amount of —rj. Hence, only when 77 — p < —i], or, equivalently, when 

2 V < p (24) 



the winning connection decreases more than the competing connection. In 
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the opposite case, 2r] > p, the winning connection remains larger than its 
competitor, and, at the next learning step, the output will be wrong again. 

Combining the inequalities (23) and (24), we find the central result of this 
article (7), the parameter region for which a Hebbian term in the learning rule 
is advantageous. This observation is confirmed by the numerical experiment 
of section 3.1, so, in particular, figure 2. 

Note that the reasoning leading to the main results (23) and (24) was based 
on an assumption regarding the initial values. In particular, it was assumed 
that the weights were small compared to k (which was put equal to 1). In 
reference [6] it was shown that the pre-factor (16) of the Hebbian term tends 
to zero during the learning process: 

k - (2xi - l)(hi - Oi) ->0 (25) 

implying, that for a small number of active neurons the absolute values of the 
weights are of the order k, as follows with (4) and (10). Hence, the assumption 
that the weights are small compared to k (k — 1) is guaranteed to break down 
at a certain point in the learning process. 



5 Summary 

We have shown, in a particular model, that a Hebbian component in a rein- 
forcement rule improves the ability to learn input-output relations by a neural 
net. 
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